Out-of-bag estimation of the optimal sample size in bagging

نویسندگان

  • Gonzalo Martínez-Muñoz
  • Alberto Suárez
چکیده

The performance of m-out-of-n bagging with and without replacement in terms of the sampling ratio (m/n) is analyzed. Standard bagging uses resampling with replacement to generate bootstrap samples of equal size as the original training set mwor = n. Without-replacement methods typically use half samples mwr = n/2. These choices of sampling sizes are arbitrary and need not be optimal in terms of the classification performance of the ensemble. We propose to use the out-of-bag estimates of the generalization accuracy to select a near-optimal value for the sampling ratio. Ensembles of classifiers trained on independent samples whose size is such that the out-of-bag error of the ensemble is as low as possible generally improve the performance of standard bagging and can be efficiently built.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Small-Sample Error Estimation for Bagged Classification Rules

Application of ensemble classification rules in genomics and proteomics has become increasingly common. However, the problem of error estimation for these classification rules, particularly for bagging under the small-sample settings prevalent in genomics and proteomics, is not well understood. Breiman proposed the “out-of-bag” method for estimating statistics of bagged classifiers, which was s...

متن کامل

Trimmed bagging a

Bagging has been found to be successful in increasing the predictive performance of unstable classifiers. Bagging draws bootstrap samples from the training sample, applies the classifier to each bootstrap sample, and then averages over all obtained classification rules. The idea of trimmed bagging is to exclude the bootstrapped classification rules that yield the highest error rates, as estimat...

متن کامل

Trimmed bagging

Bagging has been found to be successful in increasing the predictive performance of unstable classifiers. Bagging draws bootstrap samples from the training sample, applies the classifier to each bootstrap sample, and then averages over all obtained classification rules. The idea of trimmed bagging is to exclude the bootstrapped classification rules that yield the highest error rates, as estimat...

متن کامل

The Effect of Estimation Error on Risk-adjusted Bernoulli GEWMA Control Chart in Multistage Healthcare Processes

Background and objectives: Risk-adjusted Bernoulli control chart is one of the main tools for monitoring multistage healthcare processes to achieve higher performance and effectiveness in healthcare settings. Using parameter estimates can lead to significantly deteriorate chart performance. However, so far, the effect of estimation error on this chart in which healthcare ...

متن کامل

Determining the Sample size for Estimation of the CCC-R Control Chart Parameters Based on Estimation Costs

In today's highly competitive industrial environment due to fast technology development, quality practitioners will to detect out-of-control situations and take actions whenever is necessary as soon as possible. Accordingly, new statistical procedures have been enhanced incessantly both to handle high yield processes along with looking for methods of minimizing all quality cost. CCC-r chart, th...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • Pattern Recognition

دوره 43  شماره 

صفحات  -

تاریخ انتشار 2010